The FCC (Fluidized Catalytic Cracking) condenser is a critical component in refinery operations. It process sweet Vacuum Gas Oil as a feed and cracks heavy oil into lighter oils like LPG and Gasoline.
Any reduction in efficiency can indicate operational issues such as fouling, leaks, or mechanical failure. This project applies Machine Learning (ML) techniques to detect anomalies in FCC condenser efficiency.
The dataset contains operational parameters, including temperature, pressure, flow rates, and velocities. The following preprocessing steps were performed:
EDA was conducted to visualize trends and detect potential anomalies. Key findings:
1. Principal Component Analysis (PCA)
PCA reduced high-dimensional data to two principal components.
Anomalies were detected by analyzing data points far from the normal cluster.
2. Autoencoder (Deep Learning)
A neural network trained to reconstruct normal data patterns.
Reconstruction error was used to flag high-error data points as anomalies.
Importing libraries and datasets
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
plt.rcParams['figure.figsize'] = [12, 6]
plt.rcParams.update({'font.size': 12})
df = pd.read_excel('columns.xlsx')
col = df['Symbol'].values
col = col.tolist()
df_stableFeedFlow = pd.read_csv(r'https://raw.githubusercontent.com/AshuPraja13/FCC-abnormality-detection/main/NOC_stableFeedFlow_outputs.csv',header=None)
df_stableFeedFlow.set_index = df_stableFeedFlow.iloc[:,0]
df_stableFeedFlow = df_stableFeedFlow.drop(columns=0)
df_stableFeedFlow.columns= col
df_stableFeedFlow.sample(5)
| F3 | Tatm | T1 | P4 | deltaP | P6 | Fair | T3 | T2 | Tr | ... | FLCO | FSlurry | FReflux | Tfra | T10 | T20 | V9 | V8 | V10 | V11 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 632 | 164.85 | 79.961 | 460.98 | 34.4 | -6.4 | 28 | 2.6806 | 1562.7 | 616.00 | 969.05 | ... | 1644.7 | 212.78 | 2993.6 | 314.71 | 509.76 | 628.02 | 45.995 | 49.644 | 49.402 | 47.286 |
| 1759 | 165.28 | 78.490 | 460.25 | 34.4 | -6.4 | 28 | 2.6872 | 1569.8 | 616.01 | 969.04 | ... | 1647.0 | 209.61 | 3017.6 | 314.15 | 510.13 | 628.50 | 46.808 | 50.141 | 49.038 | 47.382 |
| 849 | 165.09 | 79.857 | 460.68 | 34.4 | -6.4 | 28 | 2.6847 | 1566.0 | 616.02 | 969.06 | ... | 1642.0 | 210.79 | 3006.4 | 314.88 | 509.92 | 628.18 | 45.873 | 49.829 | 48.649 | 47.118 |
| 2656 | 165.14 | 77.675 | 461.05 | 34.4 | -6.4 | 28 | 2.6848 | 1564.0 | 615.99 | 969.04 | ... | 1646.1 | 209.44 | 2987.6 | 313.13 | 509.96 | 628.23 | 48.028 | 49.813 | 48.804 | 47.349 |
| 1809 | 165.01 | 78.825 | 460.59 | 34.4 | -6.4 | 28 | 2.6801 | 1566.1 | 616.01 | 968.99 | ... | 1641.7 | 218.10 | 3003.6 | 314.25 | 509.93 | 628.41 | 46.728 | 49.886 | 49.767 | 47.099 |
5 rows × 46 columns
df_varyingFeedFlow=pd.read_csv(r'https://raw.githubusercontent.com/AshuPraja13/FCC-abnormality-detection/main/NOC_varyingFeedFlow_outputs.csv',header=None)
df_varyingFeedFlow.set_index = df_varyingFeedFlow.iloc[:,0]
df_varyingFeedFlow = df_varyingFeedFlow.drop(columns=0)
df_varyingFeedFlow.columns= col
df_varyingFeedFlow.sample(5)
| F3 | Tatm | T1 | P4 | deltaP | P6 | Fair | T3 | T2 | Tr | ... | FLCO | FSlurry | FReflux | Tfra | T10 | T20 | V9 | V8 | V10 | V11 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3259 | 164.68 | 78.932 | 461.14 | 34.4 | -6.4 | 28 | 2.6729 | 1560.6 | 616.00 | 968.96 | ... | 1635.6 | 215.15 | 2915.0 | 312.66 | 509.18 | 627.24 | 48.220 | 48.679 | 49.227 | 46.820 |
| 3602 | 163.01 | 79.983 | 461.18 | 34.4 | -6.4 | 28 | 2.6486 | 1550.1 | 616.00 | 969.00 | ... | 1610.3 | 227.94 | 2732.3 | 310.08 | 507.40 | 624.66 | 50.636 | 46.035 | 48.098 | 45.560 |
| 361 | 166.80 | 78.746 | 461.15 | 34.4 | -6.4 | 28 | 2.7136 | 1573.7 | 616.01 | 969.08 | ... | 1675.9 | 197.06 | 3256.9 | 318.27 | 512.04 | 631.08 | 42.794 | 53.392 | 50.467 | 48.867 |
| 6655 | 164.37 | 79.734 | 461.51 | 34.4 | -6.4 | 28 | 2.6710 | 1556.4 | 616.00 | 969.02 | ... | 1629.5 | 211.65 | 2875.8 | 312.44 | 508.82 | 626.51 | 48.268 | 48.059 | 48.031 | 46.516 |
| 6216 | 162.63 | 79.331 | 460.62 | 34.4 | -6.4 | 28 | 2.6415 | 1551.1 | 615.99 | 968.99 | ... | 1603.4 | 226.02 | 2637.2 | 307.75 | 506.56 | 623.31 | 53.173 | 44.809 | 47.564 | 45.250 |
5 rows × 46 columns
df_condEff_decrease = pd.read_csv(r'https://raw.githubusercontent.com/AshuPraja13/FCC-abnormality-detection/main/condEff_decrease_outputs.csv',header=None)
df_condEff_decrease.set_index = df_condEff_decrease.iloc[:,0]
df_condEff_decrease = df_condEff_decrease.drop(columns=0)
df_condEff_decrease.columns= col
df_condEff_decrease.sample(5)
| F3 | Tatm | T1 | P4 | deltaP | P6 | Fair | T3 | T2 | Tr | ... | FLCO | FSlurry | FReflux | Tfra | T10 | T20 | V9 | V8 | V10 | V11 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 545 | 164.88 | 79.732 | 460.44 | 34.4 | -6.4 | 28 | 2.6792 | 1566.1 | 616.00 | 969.01 | ... | 1640.9 | 216.86 | 3044.1 | 317.97 | 509.83 | 628.21 | 42.217 | 49.937 | 49.582 | 47.062 |
| 1427 | 164.99 | 75.169 | 460.74 | 34.4 | -6.4 | 28 | 2.6798 | 1565.0 | 616.00 | 969.00 | ... | 1638.4 | 214.28 | 3025.6 | 317.74 | 509.68 | 627.93 | 42.521 | 49.668 | 49.428 | 46.934 |
| 1114 | 165.01 | 78.505 | 461.15 | 34.4 | -6.4 | 28 | 2.6823 | 1562.6 | 615.99 | 969.02 | ... | 1650.1 | 211.67 | 3094.6 | 320.84 | 509.93 | 628.25 | 39.116 | 50.300 | 50.083 | 47.580 |
| 340 | 164.59 | 78.586 | 460.67 | 34.4 | -6.4 | 28 | 2.6733 | 1562.9 | 615.99 | 968.98 | ... | 1640.6 | 223.55 | 2993.8 | 315.26 | 509.62 | 628.10 | 45.376 | 49.551 | 50.644 | 47.058 |
| 1051 | 165.03 | 78.948 | 460.82 | 34.4 | -6.4 | 28 | 2.6820 | 1564.8 | 616.00 | 969.02 | ... | 1641.1 | 212.26 | 3075.8 | 320.81 | 509.77 | 627.99 | 39.079 | 50.000 | 49.319 | 47.085 |
5 rows × 46 columns
EDA
df_condEff_decrease.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1440 entries, 0 to 1439 Data columns (total 46 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 F3 1440 non-null float64 1 Tatm 1440 non-null float64 2 T1 1440 non-null float64 3 P4 1440 non-null float64 4 deltaP 1440 non-null float64 5 P6 1440 non-null int64 6 Fair 1440 non-null float64 7 T3 1440 non-null float64 8 T2 1440 non-null float64 9 Tr 1440 non-null float64 10 Treg 1440 non-null float64 11 Lsp 1440 non-null float64 12 Tcyc 1440 non-null float64 13 Tcyc - Treg 1440 non-null float64 14 Cco,sg 1440 non-null int64 15 Co2,sg 1440 non-null float64 16 P5 1440 non-null float64 17 V4 1440 non-null float64 18 V6 1440 non-null float64 19 V7 1440 non-null float64 20 V3 1440 non-null float64 21 V1 1440 non-null float64 22 V2 1440 non-null float64 23 Frgc 1440 non-null int64 24 Fsc 1440 non-null int64 25 ACAB 1440 non-null float64 26 AWGC 1440 non-null float64 27 F5 1440 non-null float64 28 F7 1440 non-null float64 29 Fsg 1440 non-null float64 30 FV11 1440 non-null int64 31 P1 1440 non-null float64 32 P2 1440 non-null float64 33 FLPG 1440 non-null float64 34 FLN 1440 non-null float64 35 FHN 1440 non-null float64 36 FLCO 1440 non-null float64 37 FSlurry 1440 non-null float64 38 FReflux 1440 non-null float64 39 Tfra 1440 non-null float64 40 T10 1440 non-null float64 41 T20 1440 non-null float64 42 V9 1440 non-null float64 43 V8 1440 non-null float64 44 V10 1440 non-null float64 45 V11 1440 non-null float64 dtypes: float64(41), int64(5) memory usage: 517.6 KB
df_condEff_decrease.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| F3 | 1440.0 | 164.964931 | 1.705031e-01 | 164.480000 | 164.840000 | 164.970000 | 165.080000 | 165.520000 |
| Tatm | 1440.0 | 78.342074 | 1.495206e+00 | 75.014000 | 77.206250 | 78.782000 | 79.699250 | 80.057000 |
| T1 | 1440.0 | 460.921306 | 3.773500e-01 | 459.880000 | 460.660000 | 460.910000 | 461.190000 | 461.930000 |
| P4 | 1440.0 | 34.400000 | 8.884870e-13 | 34.400000 | 34.400000 | 34.400000 | 34.400000 | 34.400000 |
| deltaP | 1440.0 | -6.400001 | 1.366802e-05 | -6.400300 | -6.400000 | -6.400000 | -6.400000 | -6.400000 |
| P6 | 1440.0 | 28.000000 | 0.000000e+00 | 28.000000 | 28.000000 | 28.000000 | 28.000000 | 28.000000 |
| Fair | 1440.0 | 2.679885 | 3.071531e-03 | 2.671800 | 2.677700 | 2.680100 | 2.682000 | 2.689500 |
| T3 | 1440.0 | 1563.718958 | 2.493819e+00 | 1557.700000 | 1561.900000 | 1563.500000 | 1565.200000 | 1571.900000 |
| T2 | 1440.0 | 615.999910 | 6.663159e-03 | 615.980000 | 616.000000 | 616.000000 | 616.000000 | 616.020000 |
| Tr | 1440.0 | 968.999750 | 3.104291e-02 | 968.910000 | 968.980000 | 969.000000 | 969.020000 | 969.110000 |
| Treg | 1440.0 | 1250.001528 | 5.095774e-02 | 1249.900000 | 1250.000000 | 1250.000000 | 1250.000000 | 1250.100000 |
| Lsp | 1440.0 | 29.653098 | 9.399890e-02 | 29.376000 | 29.586000 | 29.655000 | 29.712000 | 29.905000 |
| Tcyc | 1440.0 | 1255.278958 | 5.024220e-02 | 1255.200000 | 1255.200000 | 1255.300000 | 1255.300000 | 1255.400000 |
| Tcyc - Treg | 1440.0 | 5.278488 | 3.785116e-02 | 5.186900 | 5.250175 | 5.280050 | 5.306625 | 5.388200 |
| Cco,sg | 1440.0 | 29881.611806 | 4.865322e+01 | 29737.000000 | 29846.000000 | 29880.000000 | 29917.250000 | 30009.000000 |
| Co2,sg | 1440.0 | 0.012470 | 1.691557e-04 | 0.012067 | 0.012344 | 0.012476 | 0.012599 | 0.012967 |
| P5 | 1440.0 | 24.900000 | 6.752501e-13 | 24.900000 | 24.900000 | 24.900000 | 24.900000 | 24.900000 |
| V4 | 1440.0 | 47.584573 | 1.025714e+00 | 45.364000 | 46.808000 | 47.828000 | 48.496000 | 48.969000 |
| V6 | 1440.0 | 24.783957 | 1.008147e-01 | 24.532000 | 24.708000 | 24.797000 | 24.866250 | 25.033000 |
| V7 | 1440.0 | 54.577801 | 6.263411e-02 | 54.413000 | 54.534000 | 54.582000 | 54.621000 | 54.773000 |
| V3 | 1440.0 | 46.982188 | 1.595010e-02 | 46.937000 | 46.971000 | 46.982000 | 46.993000 | 47.025000 |
| V1 | 1440.0 | 57.909178 | 1.800725e-01 | 57.470000 | 57.778500 | 57.893000 | 58.018500 | 58.493000 |
| V2 | 1440.0 | 45.315724 | 5.301285e-02 | 45.177000 | 45.283000 | 45.317000 | 45.350000 | 45.481000 |
| Frgc | 1440.0 | 49572.265972 | 6.081201e+01 | 49407.000000 | 49529.000000 | 49576.000000 | 49614.000000 | 49765.000000 |
| Fsc | 1440.0 | 49572.208333 | 6.126321e+01 | 49411.000000 | 49529.000000 | 49577.000000 | 49614.000000 | 49764.000000 |
| ACAB | 1440.0 | 280.687479 | 1.405453e+00 | 277.550000 | 279.710000 | 281.055000 | 281.910000 | 282.860000 |
| AWGC | 1440.0 | 213.537201 | 6.912833e+00 | 198.490000 | 208.370000 | 215.240000 | 219.650000 | 222.610000 |
| F5 | 1440.0 | 1989.637292 | 6.187052e+00 | 1974.600000 | 1985.175000 | 1989.100000 | 1993.400000 | 2009.700000 |
| F7 | 1440.0 | 3735.766111 | 5.533026e+00 | 3722.500000 | 3731.800000 | 3735.800000 | 3739.800000 | 3752.000000 |
| Fsg | 1440.0 | 160.793049 | 1.845840e-01 | 160.310000 | 160.660000 | 160.810000 | 160.920000 | 161.370000 |
| FV11 | 1440.0 | 29078.619444 | 7.483565e+02 | 27433.000000 | 28523.000000 | 29265.000000 | 29738.250000 | 30068.000000 |
| P1 | 1440.0 | 14.637990 | 1.015644e-04 | 14.637000 | 14.638000 | 14.638000 | 14.638000 | 14.638000 |
| P2 | 1440.0 | 35.044537 | 2.357305e-02 | 34.995000 | 35.026000 | 35.040000 | 35.063000 | 35.101000 |
| FLPG | 1440.0 | 3199.203681 | 1.250097e+02 | 2929.900000 | 3104.475000 | 3228.800000 | 3310.200000 | 3366.300000 |
| FLN | 1440.0 | 3751.406806 | 1.255117e+02 | 3582.700000 | 3640.875000 | 3721.100000 | 3844.300000 | 4029.600000 |
| FHN | 1440.0 | 711.221708 | 4.431817e+00 | 698.640000 | 708.587500 | 710.950000 | 713.720000 | 722.810000 |
| FLCO | 1440.0 | 1642.687292 | 4.569204e+00 | 1630.300000 | 1639.400000 | 1642.600000 | 1646.100000 | 1654.900000 |
| FSlurry | 1440.0 | 214.302257 | 4.559766e+00 | 201.700000 | 210.787500 | 214.325000 | 217.742500 | 225.080000 |
| FReflux | 1440.0 | 3038.856389 | 4.980395e+01 | 2923.500000 | 2999.575000 | 3050.200000 | 3077.500000 | 3127.300000 |
| Tfra | 1440.0 | 317.874403 | 3.343519e+00 | 310.700000 | 315.350000 | 318.700000 | 320.820000 | 322.170000 |
| T10 | 1440.0 | 509.777910 | 1.371906e-01 | 509.380000 | 509.680000 | 509.770000 | 509.870000 | 510.130000 |
| T20 | 1440.0 | 628.078812 | 1.482168e-01 | 627.680000 | 627.990000 | 628.080000 | 628.180000 | 628.400000 |
| V9 | 1440.0 | 42.438248 | 3.860931e+00 | 37.629000 | 39.092250 | 41.394500 | 45.168750 | 51.247000 |
| V8 | 1440.0 | 49.872215 | 3.110262e-01 | 49.160000 | 49.666000 | 49.880500 | 50.083500 | 50.627000 |
| V10 | 1440.0 | 49.723878 | 6.232966e-01 | 47.978000 | 49.351750 | 49.678500 | 50.085000 | 51.357000 |
| V11 | 1440.0 | 47.170649 | 2.520129e-01 | 46.496000 | 46.984000 | 47.164000 | 47.354250 | 47.844000 |
sns.heatmap(df_condEff_decrease.corr(),cmap='coolwarm')
<AxesSubplot:>
for n,i in enumerate(df_stableFeedFlow.columns):
plt.figure(figsize=(12,2))
plt.plot(df_stableFeedFlow[i])
plt.xlabel('time (mins)')
plt.ylabel(i)
plt.title(df[df['Symbol']==i]['Description'].values)
plt.show()
for n,i in enumerate(df_varyingFeedFlow.columns):
plt.figure(figsize=(12,2))
plt.plot(df_varyingFeedFlow[i])
plt.xlabel('time (mins)')
plt.ylabel(i)
plt.title(df[df['Symbol']==i]['Description'].values)
plt.show()
for n,i in enumerate(df_condEff_decrease.columns):
plt.figure(figsize=(12,2))
plt.plot(df_condEff_decrease[i])
plt.xlabel('time (mins)')
plt.ylabel(i)
plt.title(df[df['Symbol']==i]['Description'].values)
plt.show()
Scaling the data with mean=0 & std = 1 using Standard Scalar.
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
X = ss.fit_transform(df_stableFeedFlow)
Applying PCA
from sklearn.decomposition import PCA
pca = PCA()
X_pca = pca.fit_transform(X)
plt.figure(figsize=(15,6))
sns.set_style('whitegrid')
sns.lineplot(x=list(range(1,47)), y=np.cumsum(pca.explained_variance_ratio_), drawstyle='steps-pre')
sns.lineplot(x=list(range(0,46)),y=np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('Eigen Values')
plt.ylabel('Ratio of Variance')
plt.title('Variance by each Eigen Value')
plt.show()
It can be clearly seen than 10 dimentions can describe more than 98% data, hence redcing the feature space from 46 to 10.
pca = PCA(n_components=10)
X_pca = pca.fit_transform(X)
plt.figure(figsize=(15,6))
sns.set_style('whitegrid')
sns.lineplot(x=list(range(1,11)), y=np.cumsum(pca.explained_variance_ratio_), drawstyle='steps-pre')
sns.lineplot(x=list(range(0,10)),y=np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('Eigen Values')
plt.ylabel('Ratio of Variance')
plt.title('Variance by each Eigen Value')
plt.show()
Applying Autoencoders
X_train = X.reshape(2880,46,1)
Lets create a Sequential model with Bidirectional LSTM & train the model when plant is in steady state.
To avoid overfitting of model by using 20% dropout.
# del model
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(256,return_sequences=True),input_shape=(46,1)))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128,return_sequences=True)))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Dense(1))
model.compile(optimizer='adam', loss='mse', metrics=['mae'])
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bidirectional (Bidirection (None, 46, 512) 528384
al)
dropout (Dropout) (None, 46, 512) 0
bidirectional_1 (Bidirecti (None, 46, 256) 656384
onal)
dropout_1 (Dropout) (None, 46, 256) 0
dense (Dense) (None, 46, 1) 257
=================================================================
Total params: 1185025 (4.52 MB)
Trainable params: 1185025 (4.52 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
model.fit(X_train,X_train,epochs=30)
Epoch 1/30 90/90 [==============================] - 66s 406ms/step - loss: 0.1392 - mae: 0.1968 Epoch 2/30 90/90 [==============================] - 37s 407ms/step - loss: 0.0027 - mae: 0.0359 Epoch 3/30 90/90 [==============================] - 37s 407ms/step - loss: 0.0023 - mae: 0.0326 Epoch 4/30 90/90 [==============================] - 37s 414ms/step - loss: 0.0021 - mae: 0.0308 Epoch 5/30 90/90 [==============================] - 37s 410ms/step - loss: 0.0020 - mae: 0.0294 Epoch 6/30 90/90 [==============================] - 37s 409ms/step - loss: 0.0018 - mae: 0.0283 Epoch 7/30 90/90 [==============================] - 37s 413ms/step - loss: 0.0018 - mae: 0.0276 Epoch 8/30 90/90 [==============================] - 37s 413ms/step - loss: 0.0017 - mae: 0.0271 Epoch 9/30 90/90 [==============================] - 37s 415ms/step - loss: 0.0017 - mae: 0.0268 Epoch 10/30 90/90 [==============================] - 37s 414ms/step - loss: 0.0016 - mae: 0.0264 Epoch 11/30 90/90 [==============================] - 37s 414ms/step - loss: 0.0016 - mae: 0.0260 Epoch 12/30 90/90 [==============================] - 37s 416ms/step - loss: 0.0016 - mae: 0.0258 Epoch 13/30 90/90 [==============================] - 37s 414ms/step - loss: 0.0016 - mae: 0.0257 Epoch 14/30 90/90 [==============================] - 38s 418ms/step - loss: 0.0015 - mae: 0.0254 Epoch 15/30 90/90 [==============================] - 37s 413ms/step - loss: 0.0015 - mae: 0.0251 Epoch 16/30 90/90 [==============================] - 37s 412ms/step - loss: 0.0015 - mae: 0.0249 Epoch 17/30 90/90 [==============================] - 37s 413ms/step - loss: 0.0015 - mae: 0.0249 Epoch 18/30 90/90 [==============================] - 38s 420ms/step - loss: 0.0015 - mae: 0.0248 Epoch 19/30 90/90 [==============================] - 37s 415ms/step - loss: 0.0015 - mae: 0.0247 Epoch 20/30 90/90 [==============================] - 38s 419ms/step - loss: 0.0014 - mae: 0.0247 Epoch 21/30 90/90 [==============================] - 38s 420ms/step - loss: 0.0014 - mae: 0.0244 Epoch 22/30 90/90 [==============================] - 38s 425ms/step - loss: 0.0014 - mae: 0.0245 Epoch 23/30 90/90 [==============================] - 38s 418ms/step - loss: 0.0014 - mae: 0.0244 Epoch 24/30 90/90 [==============================] - 37s 411ms/step - loss: 0.0014 - mae: 0.0244 Epoch 25/30 90/90 [==============================] - 38s 428ms/step - loss: 0.0014 - mae: 0.0242 Epoch 26/30 90/90 [==============================] - 39s 438ms/step - loss: 0.0014 - mae: 0.0243 Epoch 27/30 90/90 [==============================] - 23s 253ms/step - loss: 0.0014 - mae: 0.0242 Epoch 28/30 90/90 [==============================] - 24s 266ms/step - loss: 0.0014 - mae: 0.0242 Epoch 29/30 90/90 [==============================] - 22s 247ms/step - loss: 0.0014 - mae: 0.0243 Epoch 30/30 90/90 [==============================] - 22s 249ms/step - loss: 0.0014 - mae: 0.0243
<keras.src.callbacks.History at 0x1c447022b50>
Calculating the Reconstruction error using MAE.
Considering 99 percentile of error as an acceptable range, and it signifies the steady state operation.
error_ae = []
for i in range(X.shape[0]):
y_pred = model.predict(X[i].reshape(1,46,1),verbose=None)[0,:,0]
error_ae.append(np.abs(X[i]-y_pred).sum())
AE_CL = np.percentile(error_ae,99)
pd.Series(error_ae).plot()
plt.hlines(AE_CL,0,len(error_ae),colors='red',linestyles='--')
<matplotlib.collections.LineCollection at 0x1c455024850>
Calculating the Reconstruction error using Q-test, T22-test & Cosine similarity.
Considering 99 percentile of error as an acceptable range, and it signifies the steady state operation.
X_reconstructed = np.dot(X_pca,pca.components_)
error_pca = X-X_reconstructed
Q_train = np.sum(np.abs(error_pca),axis=1)
Q_CL = np.percentile(Q_train,99)
# Q_train plot with CL
plt.figure()
plt.plot(Q_train, color='black')
plt.plot([1,len(Q_train)],[Q_CL,Q_CL], linestyle='--',color='red', linewidth=2)
plt.xlabel('Sample #')
plt.ylabel('Q metric: training data')
plt.title(f'Q metrix is max: {Q_train.max()} at:{Q_train.argmax()}mins')
plt.show()
lambda_ = np.diag(pca.explained_variance_)
lambda_inv = np.linalg.inv(lambda_)
T_train = np.zeros(X_pca.shape[0])
for i in range(X_pca.shape[0]):
T_train[i] = np.dot(np.dot(X_pca[i],lambda_inv),X_pca[i].T)
T_CL = np.percentile(T_train,99)
# T2_train plot with CL
plt.figure()
plt.plot(T_train, color='black')
plt.plot([1,len(T_train)],[T_CL,T_CL], linestyle='--',color='red', linewidth=2)
plt.xlabel('Sample #')
plt.ylabel('T$^2$ metric: training data')
plt.title(f'T$^2$ metrix is max: {np.array(T_train).max()} at:{np.array(T_train).argmax()}mins')
plt.show()
cosine = []
ed = []
for i in range(X.shape[0]):
v1 = X[i]
v2 = np.dot(X_pca,pca.components_)[i]
cosine.append(np.dot(v1,v2)/(np.linalg.norm(v1)*np.linalg.norm(v2)))
ed.append(np.linalg.norm(v1 - v2))
C_CL = np.min(cosine)
E_CL = np.percentile(ed,99)
# pd.Series(ed).plot(color='black')
# plt.plot([1,len(ed)],[E_CL,E_CL], linestyle='--',color='red', linewidth=2)
# plt.show()
pd.Series(cosine).plot(color='black')
plt.plot([1,len(cosine)],[C_CL,C_CL], linestyle='--',color='red', linewidth=2)
plt.xlabel('Sample #')
plt.ylabel('Cosine similarity metric: training data')
plt.title(f'Cosine Similarity')
plt.show()
Q_CL,T_CL,C_CL,E_CL,AE_CL
(4.123113215519066, 20.4243503525688, 0.9427550112367161, 0.9281694746755804, 0.4919710296664876)
Let's create a function for test data preprocessing and testing the data with our model.
def Q_test(X,X_pca,pca_components_,Q_CL):
X_reconstructed = np.dot(X_pca,pca_components_)
error_pca = X-X_reconstructed
Q_train = np.sum(np.abs(error_pca),axis=1)
# Q_train plot with CL
plt.figure()
plt.plot(Q_train, color='black')
plt.plot([1,len(Q_train)],[Q_CL,Q_CL], linestyle='--',color='red', linewidth=2)
plt.xlabel('Sample #')
plt.ylabel('Q metric: training data')
plt.title(f'Q metrix is max: {Q_train.max()} at:{Q_train.argmax()}mins')
plt.show()
return error_pca
def T_test(X_pca,explained_variance_,TCL):
lambda_ = np.diag(pca.explained_variance_)
lambda_inv = np.linalg.inv(lambda_)
T_train = np.zeros(X_pca.shape[0])
for i in range(X_pca.shape[0]):
T_train[i] = np.dot(np.dot(X_pca[i],lambda_inv),X_pca[i].T)
# T2_train plot with CL
plt.figure()
plt.plot(T_train, color='black')
plt.plot([1,len(T_train)],[T_CL,T_CL], linestyle='--',color='red', linewidth=2)
plt.xlabel('Sample #')
plt.ylabel('T$^2$ metric: training data')
plt.title(f'T$^2$ metrix is max: {np.array(T_train).max()} at:{np.array(T_train).argmax()}mins')
plt.show()
def cosine(X,X_transformed,pca_components_,C_CL,E_CL):
cosine = []
ed = []
for i in range(X.shape[0]):
v1 = X[i]
v2 = np.dot(X_transformed,pca_components_)[i]
cosine.append(np.dot(v1,v2)/(np.linalg.norm(v1)*np.linalg.norm(v2)))
ed.append(np.linalg.norm(v1 - v2))
# pd.Series(ed).plot(color='black')
# plt.plot([1,len(ed)],[E_CL,E_CL], linestyle='--',color='red', linewidth=2)
# plt.xlabel('Sample #')
# plt.ylabel('Eucledian Distance metric: training data')
# plt.show()
pd.Series(cosine).plot(color='black')
plt.plot([1,len(cosine)],[C_CL,C_CL], linestyle='--',color='red', linewidth=2)
plt.xlabel('Sample #')
plt.ylabel('Cosine similarity metric: training data')
plt.title(f'Cosine Similarity')
plt.show()
def autoencoder(df_test,CL):
X_test = ss.transform(df_test)
error_ae = []
error_sum = []
for i in range(X_test.shape[0]):
y_pred = model.predict(X_test[i].reshape(1,46,1),verbose=None)[0,:,0]
error_ae.append(np.abs(X_test[i]-y_pred))
error_sum.append(np.abs(X_test[i]-y_pred).sum())
error_ae=np.array(error_ae)
pd.Series(error_sum).plot(color = 'black')
plt.hlines(CL,0,len(error_ae),colors='red',linestyles='--')
plt.xlabel('Sample #')
plt.ylabel('Reconstruction error by Autoencoder')
return error_ae
X = ss.transform(df_varyingFeedFlow)
X_test = pca.transform(X)
error_pca = Q_test(X,X_test,pca.components_,Q_CL)
T_test(X_test,pca.explained_variance_,T_CL)
cosine(X,X_test,pca.components_,C_CL,E_CL)
error_ae = autoencoder(df_varyingFeedFlow,AE_CL)
X = ss.transform(df_condEff_decrease)
X_test = pca.transform(X)
error_pca = Q_test(X,X_test,pca.components_,Q_CL)
T_test(X_test,pca.explained_variance_,T_CL)
cosine(X,X_test,pca.components_,C_CL,E_CL)
error_ae = autoencoder(df_condEff_decrease,AE_CL)
Inference
During steady state operation the errors are within limit but, suddenly the error starts increasing after 700mins.
So, let’s check which parameters are deviating maximum form steady state.
Considering top 10 variables responsible for plant deviation.
Q test Error
#%% Q contribution
error = np.abs(error_pca).sum(axis=1)
cum = []
for index,value in enumerate(error):
if (value>Q_CL) and (len(cum)<15):
previous_val = value
cum.append(value)
if len(cum) == 15:
sample = index
break
else:
cum=[]
# sample = ((pd.Series(error_pca.sum(axis=1))-pd.Series(error_pca.sum(axis=1)).shift()).abs()).argmax()
print('Time-',sample,'mins')
error_test_sample = error_pca[sample]
Q_contri = np.abs(error_test_sample) # *error_test_sample # vector of contributions
Time- 219 mins
plt.figure(figsize=[15,4])
plt.bar(['variable ' + str((i+1)) for i in range(len(Q_contri))], Q_contri)
plt.xticks(rotation = 80)
plt.ylabel('Q contributions')
plt.show()
plt.figure(figsize=(15,40))
print('Time-',sample,'mins')
for i,n in enumerate(np.argsort(Q_contri)[:-11:-1]):
plt.subplot(5,2,i+1)
plt.plot(df_condEff_decrease.iloc[:,n],'blue', linewidth=1)
plt.xlabel('time (mins)')
plt.ylabel(df['Symbol'][n])
plt.title(df['Description'][n])
plt.show
Time- 219 mins
Autoecoder Error
#%% Autoencoder Error
error = np.abs(error_ae).sum(axis=1)
cum = []
for index,value in enumerate(error):
if (value>AE_CL) and (len(cum)<15):
previous_val = value
cum.append(value)
if len(cum) == 15:
sample = index
break
else:
cum=[]
# sample = ((pd.Series(error_ae.sum(axis=1))-pd.Series(error_ae.sum(axis=1)).shift()).abs()).argmax()
print('Time-',sample,'mins')
error_test_sample = error_pca[sample]
Q_contri = np.abs(error_test_sample) # *error_test_sample # vector of contributions
Time- 481 mins
plt.figure(figsize=(15,45))
print('Time-',sample,'mins')
for i,n in enumerate(np.argsort(error_ae[sample])[:-11:-1]):
plt.subplot(5,2,i+1)
plt.plot(df_condEff_decrease.iloc[:,n],'blue', linewidth=1)
plt.xlabel('time (mins)')
plt.ylabel(df['Symbol'][n])
plt.title(df['Description'][n])
plt.show
Time- 481 mins
It is clearly visible from the plot that LPG flow rate is continuously increasing and LN flow rate is decreasing. Consequently, the fractionator temperature & pressure CV is also opening & WGC flow, reflux flow, receiver LV starts closing.
Form this we can conclude that the condensation in MC O/H condensers is probably reduced which results in an increase in LPG flow rate(vapour) and decrease in LN flow rate(liquid).
This increase vapour flow rate increases the pressure of fractionator and load of WGC.